permutation tests #43

enryH · 2025-09-16T12:38:05Z

Description

Add a subpackage (as new model) to core.

Tasks Checklist

The folder names defines the name of the subpackage (module)
Add the user-facing functions to the __init__.py in the new folder, so that
they are available when the subpackage is imported.
Used pydantic since only dictionary outputs in my learning example...
Create Pandera schema in a file with subpackage name in the src/acore/types folder.
Optimal is to have only one output schema of results per subpackage or module.
Add a relatively small and public dataset to the data folder, or reuse an existing one for testing
Create an api example jupyter notebook in the docs/api_examples folder with that data
Use jupytext to sync the Jupyter notebook with a Python script
Update index.rst file in the docs folder with the new example
Create test script in the /test folder with the name of the subpackage or module
using pytest or unittests to test your new functionality.

- saved processed dataset to example_data - removed data cleaning code from notebook - added descriptions and permutation explanation in markdown

enryH

I will test it now.

enryH · 2025-11-07T08:39:35Z

src/acore/microbiome/internal_functions.py

@@ -0,0 +1,53 @@
+# preprocessing


I would rename this to something similar to compositional.py - I would like to not have it specific to microbiome at best, but we can group it here for now. Maybe you could move the file as compositional.py to transform/compositional.py?

enryH · 2025-11-07T08:42:15Z

example_data/mgnify/Ju2018_GO0017001_enf_inf_paired.csv

@@ -0,0 +1,25 @@
+eff_id,inf_id,sampling_location,sampling_read,eff_abundance,inf_abundance


could you add a README.md to the folder briefly putting a link where this originates from? And maybe if it is a subset, a small preprocessing script?

Copilot

Pull Request Overview

This PR adds comprehensive permutation testing functionality to the acore library, including implementations for paired and independent sample tests, chi-squared tests, and Jaccard similarity calculations. The changes also include documentation, examples, and minor code quality improvements.

Implements three main permutation test functions: paired_permutation, indep_permutation, and chi2_permutation
Adds Jaccard similarity calculations for graph comparison
Includes comprehensive test suite and tutorial documentation with real-world metagenomics example

Reviewed Changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 17 comments.

Show a summary per file

File	Description
src/acore/permutation_test/init.py	Core permutation test implementations with support for multiple metrics
src/acore/permutation_test/internal_functions.py	Helper functions for permutation and contingency table operations
src/acore/permutation_test/jaccard.py	Jaccard similarity calculation functions for set comparison
src/acore/types/permutation_test.py	Pydantic model for permutation test results
tests/test_permutation_test.py	Comprehensive test suite for permutation test functions
docs/api_examples/permutation_testing.py	Tutorial demonstrating permutation testing on metagenomics data
docs/api_examples/permutation_testing.ipynb	Jupyter notebook version of the tutorial
docs/index.rst	Updated documentation index to include permutation testing tutorial
src/acore/microbiome/internal_functions.py	New CLR transformation functions for compositional data
example_data/mgnify/Ju2018_GO0017001_enf_inf_paired.csv	Example dataset for tutorial
src/acore/multiple_testing/init.py	Code style fix: changed membership test to use `not in` operator
src/acore/enrichment_analysis/init.py	Code style fix: changed membership tests to use `not in` operator
src/acore/decomposition/umap.py	Added blank line for formatting consistency
src/acore/io/uniprot/uniprot.py	Removed extra blank lines
CONTRIBUTING.rst	Fixed installation command quoting and corrected PR template path

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-07T08:49:05Z

tests/test_permutation_test.py

+        print(result)
+        print(ttest_rel(cond1, cond2))


Debug print statements should be removed from the test code. These print statements (lines 50-51) will clutter the test output.

Suggested change

print(result)

print(ttest_rel(cond1, cond2))

Copilot · 2025-11-07T08:49:06Z

docs/api_examples/permutation_testing.py

+#
+# In this notebook we will demonstrate how to use acore's permutation testing functions on metagenomics data collected by [Ju and colleagues (2018)](https://doi.org/10.1038/s41396-018-0277-8).
+#
+# The samples in this demo were collected from wastewaster treatment plant inffluent (MGYS00005056) and effluent (MGYS00005058).


Multiple spelling errors in this line: "wastewaster" should be "wastewater" and "inffluent" should be "influent".

Suggested change

# The samples in this demo were collected from wastewaster treatment plant inffluent (MGYS00005056) and effluent (MGYS00005058).

# The samples in this demo were collected from wastewater treatment plant influent (MGYS00005056) and effluent (MGYS00005058).

Copilot · 2025-11-07T08:49:06Z

docs/api_examples/permutation_testing.py

+# ## Data preparation details
+#
+# ### Downloading
+# The analysed samples were downloaded via the [MGnify API](https://www.ebi.ac.uk/metagenomics/api/docs/). The inffluent (INF) and effluent (EFFF) datasets have paired samples and we also needed to download the sample metadata (also available via Mgnify API) to assign the correct pairing.


Multiple spelling errors: "inffluent" should be "influent" and "EFFF" should be "EFF".

Copilot · 2025-11-07T08:49:06Z

docs/api_examples/permutation_testing.py

+#
+# The permutation test compares the actual observed chosen metric (e.g., t-statistic, mean difference) with metrics calculated when the dataset values are randomly shuffled permutations of the dataset.
+#
+# If we do 100 permutations of our data (although we should do a bunch more) and only 1 of those permutations falsely showed a larger effect size than the actual observed effect than it suggests there is a 1/100 chance (p value of 0.01) of the observed effect sizze having occurred by chance.


Spelling error: "sizze" should be "size".

Suggested change

# If we do 100 permutations of our data (although we should do a bunch more) and only 1 of those permutations falsely showed a larger effect size than the actual observed effect than it suggests there is a 1/100 chance (p value of 0.01) of the observed effect sizze having occurred by chance.

# If we do 100 permutations of our data (although we should do a bunch more) and only 1 of those permutations falsely showed a larger effect size than the actual observed effect than it suggests there is a 1/100 chance (p value of 0.01) of the observed effect size having occurred by chance.

Copilot · 2025-11-07T08:49:07Z

docs/api_examples/permutation_testing.ipynb

+    "## Data preparation details\n",
+    "\n",
+    "### Downloading\n",
+    "The analysed samples were downloaded via the [MGnify API](https://www.ebi.ac.uk/metagenomics/api/docs/). The inffluent (INF) and effluent (EFFF) datasets have paired samples and we also needed to download the sample metadata (also available via Mgnify API) to assign the correct pairing.\n",


Multiple spelling errors: "inffluent" should be "influent" and "EFFF" should be "EFF".

Copilot · 2025-11-07T08:49:09Z

tests/test_permutation_test.py

+def random_choice_equal():
+    return np.random.choice(
+        ["A", "B", "C", "D"], size=100, replace=True, p=[0.25, 0.25, 0.25, 0.25]
+    )
+
+
+@pytest.fixture
+def random_choice_unequal():
+    return np.random.choice(
+        ["A", "B", "C", "D"], size=100, replace=True, p=[0.1, 0.25, 0.05, 0.6]
+    )


Test fixtures that use random number generation without a fixed seed can lead to non-reproducible test failures. Consider using np.random.default_rng(seed=...) or np.random.seed() to ensure reproducibility.

Copilot · 2025-11-07T08:49:09Z

docs/api_examples/permutation_testing.py

+# %%
+from acore.permutation_test import paired_permutation
+
+# optional choice of random number generatorfor repro


Missing space in comment: "generatorfor" should be "generator for".

Suggested change

# optional choice of random number generatorfor repro

# optional choice of random number generator for repro

Copilot · 2025-11-07T08:49:10Z

docs/api_examples/permutation_testing.ipynb

+    "\n",
+    "In this notebook we will demonstrate how to use acore's permutation testing functions on metagenomics data collected by [Ju and colleagues (2018)](https://doi.org/10.1038/s41396-018-0277-8).\n",
+    "\n",
+    "The samples in this demo were collected from wastewaster treatment plant inffluent (MGYS00005056) and effluent (MGYS00005058).\n",


Multiple spelling errors in this line: "wastewaster" should be "wastewater" and "inffluent" should be "influent".

Suggested change

"The samples in this demo were collected from wastewaster treatment plant inffluent (MGYS00005056) and effluent (MGYS00005058).\n",

"The samples in this demo were collected from wastewater treatment plant influent (MGYS00005056) and effluent (MGYS00005058).\n",

Copilot · 2025-11-07T08:49:10Z

docs/api_examples/permutation_testing.ipynb

+   "source": [
+    "from acore.permutation_test import paired_permutation\n",
+    "\n",
+    "# optional choice of random number generatorfor repro\n",


Missing space in comment: "generatorfor" should be "generator for".

Suggested change

"# optional choice of random number generatorfor repro\n",

"# optional choice of random number generator for repro\n",

Copilot · 2025-11-07T08:49:10Z

tests/test_permutation_test.py

+from acore import permutation_test as pt
+import pytest
+import numpy as np
+from scipy.stats import ttest_rel, ttest_ind, mannwhitneyu, wilcoxon


Import of 'ttest_ind' is not used.
Import of 'mannwhitneyu' is not used.
Import of 'wilcoxon' is not used.

Suggested change

from scipy.stats import ttest_rel, ttest_ind, mannwhitneyu, wilcoxon

from scipy.stats import ttest_rel

docs/api_examples/permutation_testing.py

enryH · 2025-11-07T09:13:59Z

@angelphanth Definitely 'squash and merge` this PR.

Co-authored-by: Henry Webel <[email protected]>

angelphanth added 23 commits September 9, 2025 22:37

init internal functions for permutation testing

13b1b21

user facing in init file

cdb1705

🚧 some tests

a13be15

Merge branch 'main' into anglup-learning

13149df

🐛 degeneracy check for paired permutations

5a5a42a

use internal functions

179ce05

📝 copilot docstrings

a144f7e

tests for accept reject null

e66b6b3

init tutorial

53d5f46

diff conditions were diff mgys studies

0a7e33a

other go term

dc648f6

Merge branch 'main' into anglup-learning

719b9e3

🚧 instead saving processed dataset to example_data/mgnify

5492203

📝docs: Update API examples for permutation testing

8ebab69

- saved processed dataset to example_data - removed data cleaning code from notebook - added descriptions and permutation explanation in markdown

jupytext of the permutation testing demo notebook

b6d049c

💄style: lint and reformatd code

74255e3

Merge branch 'main' into anglup-learning

afc2283

🐛fix: update PR module link in contributing guide

23a79ff

🐛 fix: add quotation marks to command

2194b08

✨ feat: add pydantic model for perm test output

5b0124e

✨ feat: add pydantic model for perm test output

83bbfda

📝 docs: jupytexted the notebook

e614049

💄 style: lint and format

6bea62d

angelphanth marked this pull request as ready for review October 30, 2025 13:27

enryH commented Nov 7, 2025

View reviewed changes

enryH requested a review from Copilot November 7, 2025 08:42

Copilot AI reviewed Nov 7, 2025

View reviewed changes

enryH commented Nov 7, 2025

View reviewed changes

docs/api_examples/permutation_testing.py Outdated Show resolved Hide resolved

enryH commented Nov 7, 2025

View reviewed changes

docs/api_examples/permutation_testing.py Outdated Show resolved Hide resolved

angelphanth and others added 2 commits November 7, 2025 13:40

Update docs/api_examples/permutation_testing.py

f27cf2e

Co-authored-by: Henry Webel <[email protected]>

Update docs/api_examples/permutation_testing.py

8415f0f

Co-authored-by: Henry Webel <[email protected]>

		@@ -0,0 +1,25 @@
		eff_id,inf_id,sampling_location,sampling_read,eff_abundance,inf_abundance

	# The samples in this demo were collected from wastewaster treatment plant inffluent (MGYS00005056) and effluent (MGYS00005058).
	# The samples in this demo were collected from wastewater treatment plant influent (MGYS00005056) and effluent (MGYS00005058).

	# If we do 100 permutations of our data (although we should do a bunch more) and only 1 of those permutations falsely showed a larger effect size than the actual observed effect than it suggests there is a 1/100 chance (p value of 0.01) of the observed effect sizze having occurred by chance.
	# If we do 100 permutations of our data (although we should do a bunch more) and only 1 of those permutations falsely showed a larger effect size than the actual observed effect than it suggests there is a 1/100 chance (p value of 0.01) of the observed effect size having occurred by chance.

	# optional choice of random number generatorfor repro
	# optional choice of random number generator for repro

	"The samples in this demo were collected from wastewaster treatment plant inffluent (MGYS00005056) and effluent (MGYS00005058).\n",
	"The samples in this demo were collected from wastewater treatment plant influent (MGYS00005056) and effluent (MGYS00005058).\n",

	"# optional choice of random number generatorfor repro\n",
	"# optional choice of random number generator for repro\n",

	from scipy.stats import ttest_rel, ttest_ind, mannwhitneyu, wilcoxon
	from scipy.stats import ttest_rel

permutation tests #43

Are you sure you want to change the base?

permutation tests #43

Uh oh!

Conversation

enryH commented Sep 16, 2025 • edited by angelphanth Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tasks Checklist

Uh oh!

enryH left a comment

Choose a reason for hiding this comment

Uh oh!

enryH Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

enryH Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

enryH commented Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

enryH commented Sep 16, 2025 •

edited by angelphanth

Loading